[None][feat] Add prefix-aware scheduling config flag to support opt-out#15526
[None][feat] Add prefix-aware scheduling config flag to support opt-out#15526SimengLiu-nv wants to merge 3 commits into
Conversation
|
/bot run --disable-fail-fast |
|
PR_Github #55084 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughIntroduces a new ChangesPrefix-Aware Scheduling Flag
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
tests/unittest/llmapi/test_llm_args.py (1)
478-503: 📐 Maintainability & Code Quality | 🔵 TrivialCoverage status: sufficient in
tests/unittest/llmapi/test_llm_args.py.
test_SchedulerConfig_declarationnow validates both defaultTrueand explicitFalsepropagation to pybind, which is the key contract for this PR in this file. No follow-up needed here.As per path instructions,
tests/**reviews should state whether coverage is sufficient or needs follow-up.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/llmapi/test_llm_args.py` around lines 478 - 503, No changes are needed. The test_SchedulerConfig_declaration function already provides sufficient coverage by validating both the default True value for enable_prefix_aware_scheduling and the explicit False value propagation to pybind through the PybindMirror conversion, which fulfills the key contract requirements for this PR.Source: Path instructions
tests/unittest/bindings/test_executor_bindings.py (1)
2495-2502: 📐 Maintainability & Code Quality | 🔵 TrivialCoverage status: sufficient in
tests/unittest/bindings/test_executor_bindings.py(after the lint fix above).The pickle and nested
ExecutorConfigassertions adequately verify round-trip preservation ofenable_prefix_aware_scheduling; no extra follow-up test is needed for this file.As per path instructions,
tests/**reviews should state whether coverage is sufficient or needs follow-up.Also applies to: 2644-2645, 2703-2703
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/bindings/test_executor_bindings.py` around lines 2495 - 2502, This is a review approval confirming that the test_scheduler_config_pickle function adequately verifies round-trip preservation of the enable_prefix_aware_scheduling attribute through pickle serialization/deserialization with the appropriate assertions on the SchedulerConfig object. No code changes are required as the test coverage is sufficient and meets the stated requirements.Source: Path instructions
tests/unittest/_torch/executor/test_kv_cache_v2_scheduler.py (1)
175-211: 📐 Maintainability & Code Quality | 🔵 TrivialCoverage status: sufficient in
tests/unittest/_torch/executor/test_kv_cache_v2_scheduler.py.The added cases cover constructor wiring and disabled-prefix-aware budget semantics with explicit assertions; no follow-up coverage is needed in this PR for this file.
As per path instructions,
tests/**reviews should state whether coverage is sufficient or needs follow-up.Also applies to: 1987-2024
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/_torch/executor/test_kv_cache_v2_scheduler.py` around lines 175 - 211, The review comment is confirming that test coverage is sufficient and properly documented per path instructions for test files. No code fix is required - the comment is an approval noting that the test cases in make_scheduler and related tests adequately cover constructor wiring and disabled-prefix-aware budget semantics with explicit assertions, meeting the requirement to document coverage status in tests/** files.Source: Path instructions
tests/unittest/_torch/executor/test_py_scheduler.py (1)
2852-2880: 📐 Maintainability & Code Quality | 🔵 TrivialCoverage status: sufficient in
tests/unittest/_torch/executor/test_py_scheduler.py.This addition directly validates the disabled path (including “must not call prefix-reuse analysis”) and is adequate for this PR scope.
As per path instructions,
tests/**reviews should state whether coverage is sufficient or needs follow-up.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/_torch/executor/test_py_scheduler.py` around lines 2852 - 2880, This review comment is a positive confirmation that the test coverage is sufficient and no fixes are required. The test method test_prefix_aware_scheduling_disabled_does_not_delay_duplicates adequately validates the disabled path for prefix-aware scheduling by ensuring that the analyze_prefix_reuse method is not called when enable_prefix_aware_scheduling is False. No code changes are needed based on this review comment.Source: Path instructions
cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp (1)
321-335: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winGuard remaining prefix-reuse tree walk on disabled mode.
When prefix-aware scheduling is disabled, this block bypasses first-chunk reuse analysis, but the encoder-init cross-summary path still performs
analyzePrefixReuselater even though skip logic is disabled in that mode. Add the same flag guard there to avoid unnecessary radix-tree walks.♻️ Suggested diff
- else if (isEncoderInit && crossKvCacheManager && crossKvCacheManager->isEnableBlockReuse() + else if (mEnablePrefixAwareScheduling && isEncoderInit && crossKvCacheManager + && crossKvCacheManager->isEnableBlockReuse() && !crossKvCacheManager->getBlockManager().isVariableWindow()) { // Encoder admission only needs the cross summary for reuse ordering. auto uniqueTokens = *(req->getEncoderUniqueTokens().value()); crossSummary = crossKvCacheManager->analyzePrefixReuse(uniqueTokens, *req); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp` around lines 321 - 335, The code correctly guards the main kvCacheManager's prefix reuse analysis with mEnablePrefixAwareScheduling flag, but the subsequent block that handles the cross-summary path with crossKvCacheManager still performs analyzePrefixReuse even when scheduling is disabled. Add the same mEnablePrefixAwareScheduling check along with the necessary block reuse and variable window guards before the crossKvCacheManager analyzePrefixReuse call to match the pattern used for kvCacheManager and prevent unnecessary radix-tree walks when prefix-aware scheduling is disabled.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/tensorrt_llm/executor/schedulerConfig.cpp`:
- Around line 33-37: The SchedulerConfig::operator== method is incomplete
because it does not include mDynamicBatchConfig in its equality comparison.
Currently it only checks mCapacitySchedulerPolicy, mContextChunkingPolicy, and
mEnablePrefixAwareScheduling. Add a comparison of mDynamicBatchConfig to the
return statement using the logical AND operator to ensure that two
SchedulerConfig objects with different dynamic batch configurations will
correctly compare as not equal.
In `@cpp/tests/unit_tests/batch_manager/microBatchSchedulerTest.cpp`:
- Around line 1413-1435: Add a precondition assert after the capacityScheduler
call (which produces scheduled1) to verify that req1 is present in scheduled1
before proceeding to the microBatchScheduler checks. This ensures the subsequent
assertion checking that req1InCtx is false is testing the correct behavior (that
microBatchScheduler filtered out req1 due to token budget constraints) rather
than passing vacuously if req1 was already absent from scheduled1. Use
ASSERT_TRUE or EXPECT_TRUE with std::any_of to confirm req1 exists in scheduled1
before the microBatchScheduler call.
In `@tests/unittest/bindings/test_executor_bindings.py`:
- Around line 1409-1410: The assertions on the
config.dynamic_batch_config.enable_batch_size_tuning and
config.dynamic_batch_config.enable_max_num_tokens_tuning properties are using
explicit `== True` comparisons, which violates the E712 linting rule. Fix this
by removing the `== True` part from both assertions and relying on the
truthiness check of the boolean properties directly. This means simplifying each
assertion to just check the boolean property itself without the explicit
comparison operator.
---
Nitpick comments:
In `@cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp`:
- Around line 321-335: The code correctly guards the main kvCacheManager's
prefix reuse analysis with mEnablePrefixAwareScheduling flag, but the subsequent
block that handles the cross-summary path with crossKvCacheManager still
performs analyzePrefixReuse even when scheduling is disabled. Add the same
mEnablePrefixAwareScheduling check along with the necessary block reuse and
variable window guards before the crossKvCacheManager analyzePrefixReuse call to
match the pattern used for kvCacheManager and prevent unnecessary radix-tree
walks when prefix-aware scheduling is disabled.
In `@tests/unittest/_torch/executor/test_kv_cache_v2_scheduler.py`:
- Around line 175-211: The review comment is confirming that test coverage is
sufficient and properly documented per path instructions for test files. No code
fix is required - the comment is an approval noting that the test cases in
make_scheduler and related tests adequately cover constructor wiring and
disabled-prefix-aware budget semantics with explicit assertions, meeting the
requirement to document coverage status in tests/** files.
In `@tests/unittest/_torch/executor/test_py_scheduler.py`:
- Around line 2852-2880: This review comment is a positive confirmation that the
test coverage is sufficient and no fixes are required. The test method
test_prefix_aware_scheduling_disabled_does_not_delay_duplicates adequately
validates the disabled path for prefix-aware scheduling by ensuring that the
analyze_prefix_reuse method is not called when enable_prefix_aware_scheduling is
False. No code changes are needed based on this review comment.
In `@tests/unittest/bindings/test_executor_bindings.py`:
- Around line 2495-2502: This is a review approval confirming that the
test_scheduler_config_pickle function adequately verifies round-trip
preservation of the enable_prefix_aware_scheduling attribute through pickle
serialization/deserialization with the appropriate assertions on the
SchedulerConfig object. No code changes are required as the test coverage is
sufficient and meets the stated requirements.
In `@tests/unittest/llmapi/test_llm_args.py`:
- Around line 478-503: No changes are needed. The
test_SchedulerConfig_declaration function already provides sufficient coverage
by validating both the default True value for enable_prefix_aware_scheduling and
the explicit False value propagation to pybind through the PybindMirror
conversion, which fulfills the key contract requirements for this PR.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8c226c76-7bfb-4363-98cf-e6d650c30a47
📒 Files selected for processing (25)
cpp/include/tensorrt_llm/batch_manager/capacityScheduler.hcpp/include/tensorrt_llm/executor/executor.hcpp/tensorrt_llm/batch_manager/capacityScheduler.cppcpp/tensorrt_llm/batch_manager/trtEncoderModel.cppcpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cppcpp/tensorrt_llm/executor/schedulerConfig.cppcpp/tensorrt_llm/executor/serialization.cppcpp/tensorrt_llm/nanobind/batch_manager/algorithms.cppcpp/tensorrt_llm/nanobind/executor/executorConfig.cppcpp/tests/unit_tests/batch_manager/capacitySchedulerTest.cppcpp/tests/unit_tests/batch_manager/microBatchSchedulerTest.cppcpp/tests/unit_tests/executor/serializeUtilsTest.cppdocs/source/developer-guide/telemetry.mddocs/source/features/kvcache.mdreviews/designs/enable_prefix_aware_scheduling.mdtensorrt_llm/_torch/pyexecutor/_util.pytensorrt_llm/_torch/pyexecutor/scheduler/scheduler.pytensorrt_llm/_torch/pyexecutor/scheduler/scheduler_v2.pytensorrt_llm/llmapi/llm_args.pytensorrt_llm/usage/llm_args_golden_manifest.jsontensorrt_llm/usage/schemas/README.mdtests/unittest/_torch/executor/test_kv_cache_v2_scheduler.pytests/unittest/_torch/executor/test_py_scheduler.pytests/unittest/bindings/test_executor_bindings.pytests/unittest/llmapi/test_llm_args.py
61321d5 to
3b04d8c
Compare
|
PR_Github #55084 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #55324 [ run ] triggered by Bot. Commit: |
|
PR_Github #55324 [ run ] completed with state
|
tongyuantongyu
left a comment
There was a problem hiding this comment.
LGTM.
Nit: the name enable_prefix_aware_scheduling feels a bit redundant. prefix_aware is probably enough here.
e11a9ed to
9ac53b3
Compare
chang-l
left a comment
There was a problem hiding this comment.
Approval for doc changes.
|
/bot run |
|
PR_Github #56407 [ run ] triggered by Bot. Commit: |
|
PR_Github #56407 [ run ] completed with state
|
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
9ac53b3 to
bcd57ce
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #56451 [ run ] triggered by Bot. Commit: |
|
PR_Github #56451 [ run ] completed with state
|
Summary by CodeRabbit
New Features
enable_prefix_aware_schedulingconfiguration option to the scheduler, allowing fine-grained control over prefix-aware scheduling behavior for KV cache optimization (enabled by default).Documentation
Description
Test Coverage
C++ focused gtests:
CapacitySchedulerTest.PrefixAwareSchedulingDisabledDoesNotDelayDuplicateRequest
CombinedSchedulerTest.PrefixAwareSchedulingDisabledKeepsReusableTokensZero
SerializeUtilsTest.SchedulerConfig
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.